Login and get codingIn this bite you will write a function that pairs filenames with each other. This is useful in the bioinformatics field, as some of the current sequencing technologies produce two paired files for each sample which need to be processed together.
The filenames have the following naming scheme:
File 1:
SampleName_S1_L001_R1_001.fastq.gz
File 2:
SampleName_S1_L001_R2_001.fastq.gz
A pair always consists of an
R1
andR2
file.The SampleName and all numbers are variable but the overall structure is always the same:
-SampleName
can contain letters, number or special characters (including_
)- The number followingS
runs from 1 to 99- The number followingL
runs from 001 to 999-R1
stands for file 1 andR2
for file 2 of a pair (no other numbers are allowed)- The last number block runs from 001 to 999- The file name extension should end infastq.gz
(no extra extensions such asfastq.gz.md5
)Your task
- Write a functionpair_files(filenames)
that receives a list of filenames and returns a list of tuples, where each tuple contains pairs of filenames in the following order(filename1-R1, filename2-R2)
- Ignore filenames that do not match the naming scheme (even if they containR1
andR2
)- Matching the filenames shoulb be case insensitive but the function should return the correct case in the filename pairs- For the tests presented here, you can assume that there is at most one file that can be paired with another (no higher tuplets)Example
# Two complete pairs, one file without partner
>>> filenames = [
"Sample1_S1_L001_R1_001.FASTQ.GZ", "Sample1_S1_L001_R2_001.fastq.gz",
"Sample2_S2_L001_R1_001.fastq.gz", "sample2_s2_l001_r2_001.fastq.gz",
"Sample3_S3_L001_R1_001.fastq.gz",
]
>>> pair_files(filenames)
[('Sample1_S1_L001_R1_001.FASTQ.GZ', 'Sample1_S1_L001_R2_001.fastq.gz'),
('Sample2_S2_L001_R1_001.fastq.gz', 'sample2_s2_l001_r2_001.fastq.gz')]
30 out of 30 users completed this Bite.
Will you be the 31st person to crack this Bite?
Resolution time: ~83 min. (avg. submissions of 5-240 min.)
Our community rates this Bite 4.5 on a 1-10 difficulty scale.
» Up for a challenge? 💪